Supervised projection approach for boosting classifiers
نویسنده
چکیده
In this paper we present a new approach for boosting methods for the construction of ensembles of classifiers. The approach is based on using the distribution given by the weighting scheme of boosting to construct a non-linear supervised projection of the original variables, instead of using the weights of the instances to train the next classifier. With this method we construct ensembles that are able to achieve a better generalization error and are more robust to noise presence. It has been proved that AdaBoost method is able to improve the margin of the instances achieved by the ensemble. Moreover, its practical success has been partially explained by this margin maximization property. However, in noisy problems, likely to occur in real-world applications, the maximization of the margin of wrong instances or outliers can lead to poor generalization. We propose an alternative approach, where the distribution of the weights given by the boosting algorithm is used to get a supervised projection. Then, the supervised projection is used to train the next classifier using a uniform distribution of the training instances. The proposed approach is compared with three boosting techniques, namely AdaBoost, GentleBoost and MadaBoost, showing an improved performance on a large set of 55 problems from the UCI Machine Learning Repository, and less sensitiveness to noise in the class labels. The behavior of the proposed algorithm in terms of margin distribution and bias-variance decomposition is also studied.
منابع مشابه
Constructing ensembles of classifiers using supervised projection methods based on misclassified instances
In this paper we propose an approach for ensemble construction based on the use of supervised projections, both linear and non-linear, to achieve both accuracy and diversity of individual classifiers. The proposed approach uses the philosophy of boosting, putting more effort on difficult instances, but instead of learning the classifier on a biased distribution of the training set, it uses misc...
متن کاملImproving BAS committee performance with a semi-supervised approach
Semi-supervised Learning is a machine learning approach that, by making use of both labeled and unlabeled data for training, can significantly improve learning accuracy. Boosting is a machine learning technique that combines several weak classifiers to improve the overall accuracy. At each iteration, the algorithm changes the weights of the examples and builds an additional classifier. A well k...
متن کاملMulticlass Semi-supervised Boosting Using Different Distance Metrics
The goal of this thesis project is to build an effective multiclass classifier which can be trained with a small amount of labeled data and a large pool of unlabeled data by applying semi-supervised learning in a boosting framework. Boosting refers to a general method of producing a very accurate classifier by combining rough and moderately inaccurate classifiers. It has attracted a significant...
متن کاملConstructing ensembles of classifiers using linear projections based on misclassified instances
In this paper we propose a novel approach for ensemble construction based on the use of linear projections to achieve both accuracy and diversity of individual classifiers. The proposed approach uses the philosophy of boosting, putting more effort on difficult instances, but instead of learning the classifier on a biased distribution of the training set it uses misclassified instances to find a...
متن کاملTitle of Thesis: Learning Structured Classifiers for Statistical Dependency Parsing Learning Structured Classifiers for Statistical Dependency Parsing
In this thesis, I present three supervised and one semi-supervised machine learning approach for improving statistical natural language dependency parsing. I first introduce a generative approach that uses a strictly lexicalised parsing model where all the parameters are based on words, without using any part-of-speech (POS) tags or grammatical categories. Then I present an improved large margi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition
دوره 42 شماره
صفحات -
تاریخ انتشار 2009